-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/bq incremental strategy insert_overwrite #2153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 this is great! And self-contained! I think I just convinced myself that we should try to ship this for 0.16.0.... i can't think of any reason at all why we should not do that. Can you?
{%- set predicates = [] if predicates is none else [] + predicates -%} | ||
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} | ||
|
||
merge into {{ target }} as DBT_INTERNAL_DEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i could live 1,000 more years and i would still not understand this DML...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have lived 2 days and I have read the documentation, and I now understand this DML.
The key thing I was missing: when not matched by source
is always true here because we're using a constant-false predicate. I was errantly thinking that we were still merging on a unique_key
, which we are not.
From the docs:
If the merge_condition is FALSE, the query optimizer avoids using a JOIN. This optimization is referred to as a constant false predicate. A constant false predicate is useful when you perform an atomic DELETE on the target plus an INSERT from a source (DELETE with INSERT is also known as a REPLACE operation).
Cool!
Neat! As part of finalizing my Discourse post about new BQ partitioning + incremental modeling in 0.16.0, I'm going to test this strategy on a (public) dataset of some size. |
…dbt into feature/bq-insert-overwrite
273f0e2
to
0656477
Compare
The test failures here appear to be intermittent weirdness on Snowflake's end... Looks like it was returning Arrow data instead of JSON data? Merging this one for 0.16.0! |
@jtcohen6 @drewbanin |
This is a small feature that builds on top of the tremendous work from #2140. It shouldn't have any breaking changes, so I think we could ship it in 0.16.1. I'm opening this now so that I can link to it in a forthcoming post about dbt + BigQuery + incremental models.
A common request from the community is an incremental materialization on BigQuery to just drop and replace an entire day of data. By setting
incremental_strategy = "insert_overwrite"
in the config, any partition with new data will be completely dropped and recreated.Example usage: